



# Multi-phase Clock Generator for High-speed Wireline Systems

Shaokang Zhao

Supervisor: Prof. C. Patrick Yue

25<sup>th</sup> March 2025

Optical Wireless Lab

Department of Electronic and Computer Engineering

The Hong Kong University of Science and Technology (HKUST)

#### **Outline**

- Introduction
- Clocking in Wireline Systems
- Proposed QCG with Open-loop QEC
- Measurement Results
- Conclusion and Future Work

#### **Outline**

- Introduction
- Clocking in Wireline Systems
- Proposed QCG with Open-loop QEC
- Measurement Results
- Conclusion and Future Work

## **Introduction: Background**

- We are now in an era of artificial intelligence (AI)
- Hardware to support AI:
  - High performance computing (HPC) systems
  - High performance data transmission



High-bandwidth Memory IO (Chip-to-memory)



PCIE, NVLink (Inter-component, chip-to-chip)



Data Center (Server-to-server)

#### **Introduction: Wireline Transceivers**

- Wireline transceivers: transmitter (TX), channel, and receiver (RX)
- Clock signal plays an important role in providing synchronization and supporting various functionalities



#### **Outline**

- Introduction
- Clocking in Wireline Systems
  - Clocking architectures
  - Review on QCG circuits
- Proposed QCG with Open-loop QEC
- Measurement Results
- Conclusion and Future Work

Clocking architecture



- Straightforward implementation
- High-power clock generation and distribution
- Bandwidth issue for clock buffers



- Relieved power and bandwidth issue
- Requires multi-phase sampling clocks

Quadrature clock generation (QCG): generating 4-phase clocks from differential clocks with:

- High phase accuracy (duty cycle = 50%, quadrature phase =  $90^{\circ}$ )

 Least jitter contribution CK\_1P **CK 1N** QCG CK\_1P (0°) CK\_1N (180°) CK\_IP **QCG CK QP (90°** CK\_QP CK\_IN **CK\_IP** (0°) **CK\_IN (180°)** CK\_QN CK QN (90°)

- Requirements for quadrature clocks:
  - − Jitter ~ system random jitter (RJ)
  - − Phase error ~ system deterministic jitter (DJ)



A quarter-rate wireline transmitter using: (a) Ideal 4-phase sampling clocks

- Requirements for quadrature clocks:
  - − Jitter ~ system random jitter (RJ)
  - Phase error ~ system deterministic jitter (DJ)



J

Degraded eye quality

Transitions blurred/distributed

A quarter-rate wireline transmitter using: (b) 4-phase sampling clocks with jitter

- Requirements for quadrature clocks:
  - Jitter ∼ system random jitter (RJ)
  - − Phase error ~ system deterministic jitter (DJ)



A quarter-rate wireline transmitter using: (c) 4-phase sampling clocks with phase error-1

- Requirements for quadrature clocks:
  - − Jitter ~ system random jitter (RJ)
  - − Phase error ~ system deterministic jitter (DJ)



A quarter-rate wireline transmitter using: (d) 4-phase sampling clocks with phase error-2

- Requirements for quadrature clocks:
  - − Jitter ~ system random jitter (RJ)
  - − Phase error ~ system deterministic jitter (DJ)



A quarter-rate wireline transmitter using: (e) 4-phase sampling clocks with both jitter and phase error

## Review on QCG Circuits: Frequency divider

Frequency divider: most commonly used QCG



## Review on QCG Circuits: Delay-Locked Loop

Delay-locked Loop (DLL)



[2] ISSCC 2022

### Review on QCG Circuits: Ring Oscillator

- Ring-oscillator-based QCG
  - Ring oscillator phase-locked loop (RO-PLL)
  - Injection-locked ring oscillator



# **Injection-Locked RO QCG [4]** RO stage #1 Low-noise 2 **Output** Improved jitter performance Open-loop, simple implementation Poor phase accuracy Sensitive to mismatch, losing lock

#### **Outline**

- Introduction
- Clocking in Wireline Systems
- Proposed QCG with Open-loop QEC
- Measurement Results
- Conclusion and Future Work

- Proposed QCG:
  - Duty cycle correction (DCC)
  - A digital controlled delay line (DCDL)
  - A 2-stage open-loop quadrature error corrector (QEC)
  - FSM for automatic calibration



**DCC**: Duty cycle correction

DCDL : Delay line, generating θ≈90°

QEC : Quadrature error correction

- Proposed QCG:
  - Duty cycle correction (DCC)
  - A digital controlled delay line (DCDL)
  - A 2-stage open-loop quadrature error corrector (QEC)
  - FSM for automatic calibration



S2D : Single-ended to differential

**DCC**: Duty cycle correction

DCDL : Delay line, generating θ≈90°

PI : Phase interpolator

**QEC**: Quadrature error correction











## **Proposed QCG: Circuit Implementation**

- Circuit implementation
  - A digitally controlled delay line (DCDL) generates coarse quadrature phase shift  $(\theta \approx 90^{\circ})$
  - Phase interpolators (PI) produce middle phase





## Proposed QCG: Choice of Phase Interpolator

- Commonly used PI
  - Voltage-mode PI: "wire-and" inverters
  - Current-mode PI: current-mode logic (CML) combiner



## Proposed QCG: Choice of Phase Interpolator

- Commonly used PI
  - Voltage-mode PI: "wire-and" inverters
  - Current-mode PI: current-mode logic (CML) combiner
- Integrating-mode PI (IMPI) [7, ISSCC 2021]



- Simple implementation
- © Linear combining
- © Compatible to square-wave clocks

## **Proposed QCG: Choice of Phase Interpolator**

- Operation principle of integrating-mode PI
  - Converting voltage clocks to current clocks



- Operation principle of integrating-mode PI
  - Applying current on a load capacitor: triangular waveforms generated



- Operation principle of integrating-mode PI
  - Combining the CK1 and CK2 currents



- Operation principle of integrating-mode PI
  - Combining the CKB1 and CK2 currents



- Operation principal of integrating-mode PI
  - Applying combined current clock on a load capacitor: trapezoidal waveforms generated



## **Proposed QCG: PI Circuit Implementation**

- Current sources with charging and discharging phase
- Circuit implementation of integrating-mode PI
  - Charging: PMOS, discharging: NMOS
  - C<sup>2</sup>MOS logic to enable PMOS or NMOS at the pace of input clock



## **Proposed QCG: Two-stage QEC**

- Residual errors might exist after 1-stage open-loop QEC
- Additional stages can further reduce the errors
  - 2-stage QEC: considering noise and mismatch



## **Proposed QCG: QCG Performance**

- > 500-run Monte Carlo simulation is performed to evaluate 2-stage QEC performance under mismatch
  - Duty cycle:  $\mu$ =50%,  $\sigma$ =0.32%
  - Quadrature phase:  $\mu$ =90.04°,  $\sigma$ =0.75°





: Delay line, generating θ≈90°

Is it good enough?



# **Proposed QCG with Open-loop QEC**

- Is it good enough?
- Two major issues: duty cycle distortion and initial phase error



# **Proposed QCG: Duty Cycle Distortion-Concept**

Duty cycle distortion → Differential phase ≠ 180° → Generated quadrature phase shift ≠ 90°



# **Proposed QCG: Duty Cycle Distortion-Circuit**

- When duty cycle distortion exists in the input clocks:
  - Decreasing/increasing trend in the load voltage waveforms → push transistors into triode region



# Proposed QCG: AM-PM of the C2C Converter

- $\triangleright$  If initial phase shift  $\theta$  significantly deviates from  $90^{\circ}$ :
  - Large swing difference of voltage waveforms (AMP<sub>1</sub>  $\neq$  AMP<sub>2</sub>)  $\rightarrow$  skew induced by C2C AM-PM characteristics ( $\theta_{D1} \neq \theta_{D2}$ )



C2C: CML-to-CMOS Converter

## **Proposed QCG: Calibration**

- For optimal QCG performance, initial calibration is necessary
  - Duty cycle distortion from the input clocks
  - Quadrature delay generated by DCDL



**DCC**: Duty cycle correction

DCDL : Delay line, generating θ≈90°

**QEC**: Quadrature error correction

# Proposed QCG: S2D and Duty Cycle Correction

- Single-ended duty cycle correction (DCC)
  - Pull-down/pull-up current modulating rising/falling transition time
- Single-ended-to-differential (S2D) converter



## **Proposed QCG: Calibration**

- Digital automatic calibration scheme:
  - Error detection circuits (DCD, QED)
  - FSM for performing automatic calibration



**QEC** 

: Single-ended to differential

DCC : Duty cycle correction

: Delay line, generating θ≈90°

: Quadrature error correction

# **Proposed QCG: Digital Calibration**

Digital Calibration: error detection circuits + FSM



# **Proposed QCG: Duty Cycle Detection**

- RC low-pass filters for duty cycle detection
  - extract DC levels of the P and N clock, representing the duty cycle



# **Proposed QCG: Duty Cycle Detection**

RC low-pass filters for duty cycle detection: 20.1 mV/1%, zerocrossing point std. dev.: 8.2 mV



# Proposed QCG: Quadrature Error Detection

A passive mixer (passive XOR) converts the quadrature error to DC voltage



# Proposed QCG: Quadrature Error Detection

A passive phase mixer (passive XOR) converts the quadrature error to DC voltage: 10.1 mV/°, zero crossing point std. dev.: 13.2 mV



# **Proposed QCG: Comparator**

- Comparators slice the detection output to digital "1" and "0"
  - -1: above target; 0: below target



# **Proposed QCG: Comparator**

Comparator with offset cancellation

– Input offset standard deviation = 400 uV  $_{rms}$   $\sim$  0.02% duty cycle and 0.04  $^{\circ}$ 

phase error



## **Proposed QCG: FSM**

- FSM without disabling strategy
  - Parameters keep being updated near the optimal value, generating spurs



## **Proposed QCG: FSM**

FSM with a pattern-detecting disabling strategy

– Successive detection output are recorded; when it toggles between 1 and 0 for M

times, disable the calibration



# **Proposed QCG: Design Summary**

- Proposed QCG system:
  - Duty cycle correction (DCC) to calibrate input duty cycle errors
  - A digital controlled delay line (DCDL) for generating coarse quadrature phase
  - A 2-stage open-loop quadrature error corrector (QEC) for reduce residual errors
  - FSM for automatic calibration



### **Outline**

- Introduction
- Clocking in Wireline Systems
- Proposed QCG with Open-loop QEC
- Measurement Results
- Conclusion and Future Work

# Measurement Result: Chip Micrograph

- The prototype chip is fabricated in TSMC 28-nm CMOS technology
- Area: 12100 um² (with FSM and comparators), 3300 um² (core area excluding digital blocks)



# Measurement Result: Setup

- Measurement setup:
  - Phase error testing
  - Phase noise testing



# **Measurement Result: Phase Accuracy**

- Reference-to-I delay (t<sub>REF-I</sub>) and reference-to-Q delay (t<sub>REF-Q</sub>) are separately measured
- Quadrature phase delay t<sub>I-Q</sub> = t<sub>REF-Q</sub> t<sub>REF-I</sub>





# **Measurement Result: Phase Accuracy**

Measured phase error ≤1.8° from 5-10 GHz across 8 chip samples



# Measurement Result: Setup

- Measurement setup:
  - Phase error testing



## Measurement Result: Phase Noise/RMS Jitter

- Integrated jitter (10k-1GHz) = 61.09 fs (Q-Phase), 59.56 fs (I-Phase)
- Reference jitter =  $41.36 \text{ fs}_{rms}$ , calculated jitter contribution =  $45 \text{ fs}_{rms}$



### **Measurement Result: Power Breakdown**

- Total power consumption: 10.2 mW at 10-GHz operation
- Negligible digital power ~ 250 uW during calibration-active region





# **Comparison Table**

#### Other open-loop methods

|                              |                              |                            |                            |                           | <del>,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,</del> |                             |                        |
|------------------------------|------------------------------|----------------------------|----------------------------|---------------------------|--------------------------------------------------|-----------------------------|------------------------|
|                              | This work                    | Columbia<br>JSSC'23<br>[1] | Columbia<br>JSSC'22<br>[2] | Xilinx<br>ISSCC'18<br>[3] | Intel<br>ESSCIRC'16<br>[4]                       | UT Dallas<br>CICC'11<br>[5] | IBM<br>ISSCC'08<br>[6] |
| Architecture                 | Open-loop                    | DLL                        | DLL+IL-QLL                 | IL-QLL                    | Open-loop                                        | Open-loop                   | Open-loop              |
| Calibration                  | Digital Auto.<br>Calibration | Manual                     | Manual                     | Frequency track<br>loop   | Digital Auto.<br>Calibration                     | None                        | None                   |
| Number of phases             | 4                            | 4                          | 8                          | 8                         | 4                                                | 8                           | 4                      |
| Process                      | 28nm CMOS                    | 65nm CMOS                  | 65nm CMOS                  | 7nm FinFET                | 28nm CMOS                                        | 65nm CMOS                   | 65nm CMOS              |
| Frequency Range<br>(GHz)     | 5-10                         | 3.5~11                     | 5~8                        | 4~16                      | 1~2.6                                            | 8~12                        | 0.37~2.5               |
| Power (mW)                   | 10.2<br>@10 GHz              | 7.8<br>@7GHz               | 15.6<br>@7GHz              | 10<br>@16GHz              | 4.4<br>@2GHz                                     | 14.8<br>@10GHz              | 2.6<br>@2.5GHz         |
| Power Efficiency<br>(mW/GHz) | 1.02                         | 1.11                       | 2.23                       | 0.63                      | 2.2                                              | 1.48                        | 1.04                   |
| Jitter (fs, <sub>rms</sub> ) | 62.1<br>@10 GHz              | 48.1<br>@7 GHz             | 65.2<br>@7 GHz             | 80<br>@16 GHz             | 37.6                                             | 470<br>@10GHz               | N/A                    |
| Integration Band (Hz)        | 10k-1G                       | 10k-1G                     | 10k-1G                     | 100k-1G                   | 10k-100M                                         | N/A                         | N/A                    |
| IQ Error (°)                 | ≤1.8                         | ≤0.9                       | ≤0.5                       | ≤1                        | ≤5                                               | ≤3.1                        | ≤5                     |
| Area(um²)                    | 12100                        | 12000                      | 21000                      | N/A                       | 3000                                             | 1500                        | 500                    |
| Supply (V)                   | 0.9                          | 1.2                        | 1.2                        | 1.2/0.88                  | 1.1                                              | 1.1                         | 1                      |
|                              |                              |                            |                            |                           |                                                  |                             |                        |

### **Outline**

- Introduction
- Clocking in Wireline Systems
- Proposed QCG with Open-loop QEC
- Measurement Results
- Conclusion and Future Work

### Conclusion

- A QCG featuring DCC, DCDL, open-loop QEC and digital automatic calibration is proposed
- The design provides an open-loop alternative of generating quadrature clocks and correcting residual phase error
- The design is verified by simulation and measurement, demonstrating a good performance in power, jitter, phase accuracy and operation range

### **Future Work**

- Fine phase control from 0 to 360° using quadrature clocks as the reference
  - de-skewing, alignment, clock & data recovery (CDR)
- Integration with the overall multi-lane transceiver system



### **Publication**

#### Conference

 Shaokang Zhao, Li Wang, and C. Patrick Yue, "Design of A 5–10 GHz Open-Loop Quadrature Clock Generator for High-Speed Wireline Systems (under review)," in 2025 IEEE 23rd Interregional NEWCAS Conference, 2025.

#### Journal

- Shaokang Zhao, Li Wang, and C. Patrick Yue, "A 5-10 GHz Quadrature Clock Generator with Open-loop Quadrature Error Correction in 28-nm CMOS (under review)," IEEE Solid-State Circuits Letters, 2025.

### Reference

- [1] Toprak-Deniz et al., "A 128-Gb/s 1.3-pJ/b PAM-4 Transmitter With Reconfigurable 3-Tap FFE in 14-nm CMOS," in IEEE Journal of Solid-State Circuits, vol. 55, no. 1, pp. 19-26, Jan. 2020.
- [2] Z. Wang and P. R. Kinget, "A 65nm CMOS, 3.5-to-11GHz, Less-Than 1.45LSB-INLpp, 7b Twin Phase Interpolator with a Wideband, Low Noise Delta Quadrature Delay-Locked Loop for High-Speed Data Links," 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 2022, pp. 292-294.
- [3] K. -h. Kim, P. W. Coteus, D. Dreps, S. Kim, S. V. Rylov and D. J. Friedman, "A 2.6mW 370MHz-to-2.5GHz Open-Loop Quadrature Clock Generator," 2008 IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, USA, 2008, pp. 458-627.
- [4] Z. Zhang, G. Zhu, C. Wang, L. Wang and C. P. Yue, "A 32-Gb/s 0.46pJ/bit PAM4 CDR Using a Quarter-Rate Linear Phase Detector and a Self-Biased PLL-Based Multiphase Clock Generator," in IEEE Journal of Solid-State Circuits, vol. 55, no. 10, pp. 2734-2746, Oct. 2020.
- [5] S. Chen et al., "A 4-to-16GHz inverter-based injection-locked quadrature clock generator with phase interpolators for multi-standard I/Os in 7nm FinFET," 2018 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 2018, pp. 390-392.

### Reference

[6] Z. Wang, Y. Zhang, Y. Onizuka and P. R. Kinget, "11.4 A High-Accuracy Multi-Phase Injection-Locked 8-Phase 7GHz Clock Generator in 65nm with 7b Phase Interpolators for High-Speed Data Links," 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 2021, pp. 186-188.

[7] A. K. Mishra, Y. Li, P. Agarwal and S. Shekhar, "A 9b-Linear 14GHz Integrating-Mode Phase Interpolator in 5nm FinFET Process," 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 2022, pp. 1-3.

# Acknowledgement

This work was supported in part by the Areas of Excellence Scheme of Hong Kong, under the grant AoE/E601/22-R, in part by the General Research Fund of Hong Kong, under the grant no.16205522 and no.16205023

# Acknowledgement

- Prof. C. Patrick Yue
- Prof. Wing Hung Ki and Prof. Fengbin Tu
- Groupmates from OWL
- My parents and my friends
- All the audience today





# Thanks!

Optical Wireless Lab

Department of Electronic and Computer Engineering

The Hong Kong University of Science and Technology (HKUST)

# **Proposed QCG: Duty Cycle Self Correction**







#### After QEC#3



# **Proposed QCG: Duty Cycle Self Correction**

#### After QEC#2



#### After QEC#3



#### After QEC#4



# **Proposed QCG: Digitally Controlled Delay Line**

> ~1.2-ps step (5.4° for 10 GHz), 19.2-ps range





# **Proposed QCG: Duty Cycle Correction**

- Current mirror to modulate pull-up and pull-down strength
- Binary-implemented current segments for digital control





8/19/2025

# **Proposed QCG: Duty Cycle Correction**

▶ Simulated DCC performance: ~0.13-% step, ~8.5-% range (TT)



# Proposed QCG: Duty Cycle Distortion-1

- When duty cycle distortion exists in the input clocks:
  - Decreasing/increasing trend in the load voltage waveforms → push transistors into triode region



# **Proposed QCG: Duty Cycle Distortion-2**

- When duty cycle distortion exists in the input clocks:
  - Decreasing/increasing trend in the load voltage waveforms → push transistors into triode region



# **Proposed QCG: Current mismatch**



# **Proposed QCG: Large Current**



# **Proposed QCG:**

Layout implementation

